Picture for Fangzhi Xu

Fangzhi Xu

OdysseyArena: Benchmarking Large Language Models For Long-Horizon, Active and Inductive Interactions

Add code
Feb 05, 2026
Viaarxiv icon

TIDE: Trajectory-based Diagnostic Evaluation of Test-Time Improvement in LLM Agents

Add code
Feb 03, 2026
Viaarxiv icon

SSL: Sweet Spot Learning for Differentiated Guidance in Agentic Optimization

Add code
Jan 30, 2026
Viaarxiv icon

$A^3$-Bench: Benchmarking Memory-Driven Scientific Reasoning via Anchor and Attractor Activation

Add code
Jan 14, 2026
Viaarxiv icon

OS-Symphony: A Holistic Framework for Robust and Generalist Computer-Using Agent

Add code
Jan 12, 2026
Viaarxiv icon

A Foundation Model for Chest X-ray Interpretation with Grounded Reasoning via Online Reinforcement Learning

Add code
Sep 04, 2025
Viaarxiv icon

Graph-R1: Towards Agentic GraphRAG Framework via End-to-end Reinforcement Learning

Add code
Jul 29, 2025
Viaarxiv icon

ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows

Add code
May 26, 2025
Figure 1 for ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows
Figure 2 for ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows
Figure 3 for ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows
Figure 4 for ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows
Viaarxiv icon

ChartSketcher: Reasoning with Multimodal Feedback and Reflection for Chart Understanding

Add code
May 25, 2025
Viaarxiv icon

Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning

Add code
Apr 11, 2025
Figure 1 for Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning
Figure 2 for Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning
Figure 3 for Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning
Figure 4 for Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning
Viaarxiv icon